Search CORE

44 research outputs found

Proceedings of the International Conference on Image Processing, Thessaloniki, 2001 IMPROVED ROI AND WITHIN FRAME DISCRIMINANT FEATURES FOR LIPREADING

Author: Chalapathy Neti
Gerasimos Potamianos
Publication venue
Publication date
Field of study

We study three aspects of designing appearance based visual features for automatic lipreading: (a) The choice of the video region of interest (ROI), on which image transform features are obtained; (b) The extraction of speech discriminant features at each frame; and (c) The use of temporal information to improve visual speech modeling. In particular, with respect to (a), we propose a ROI that includes the speaker’s jaw and cheeks, in addition to the traditionally used mouth/lip region; with respect to (b) and (c), we propose the use of a two-stage linear discriminant analysis, both within frame, as well as across a large number of frames. On a largevocabulary, continuous speech audio-visual database, the proposed visual features result in a 13 % absolute reduction in visual-only word error rate over a baseline visual front end, and in an additional 28 % relative improvement in audio-visual over audio-only phonetic classification accuracy. 1

CiteSeerX

Noisy audio feature enhancement using audio-visual speech data

Author: Chalapathy Neti
Gerasimos Potamianos
Publication venue: Press
Publication date
Field of study

We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker’s face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by mean square error estimation of the clean audio features in a training stage. The performance of the enhanced audio features is evaluated on two ASR tasks: A connected digits task and speaker-independent, largevocabulary, continuous speech recognition. In both cases and at sufficiently low signal-to-noise ratios (SNRs), ASR trained on the enhanced audio features significantly outperforms ASR trained on the noisy audio, achieving for example a 46 % relative reduction in word error rate on the digits task at-3.5 dB SNR. However, the method fails to capture the full visual modality benefit to ASR, as demonstrated by its comparison to discriminant audio-visual feature fusion introduced in previous work. 1

CiteSeerX

Speaker change detection using joint audio-visual statistics

Author: Chalapathy Neti
Giridharan Iyengar
Publication venue
Publication date
Field of study

In this paper, we presentanapproach for speaker change detection in broadcast video using joint audio-visual scene change statistics. Our experiments indicate that using joint audio-visual statistics we achieve better recall without loss of precision as compared to purely audio domain approaches for speaker change detection.

CiteSeerX

Noisy audio feature enhancement using audio-visual speech data

Author: Goecke Roland
Neti Chalapathy
Potamianos Gerasimos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Crossref

University of Canberra Research Repository